Highly Accurate Mandarin Tone Classification In The Absence of Pitch Information
نویسندگان
چکیده
A deep neural network (DNN) classifier based only on 40 mel-frequency cepstral coefficients (MFCCs) achieved 29.99% frame error rate (FER) and 16.86% segment error rate (SER) in recognizing five tonal categories in Mandarin Chinese broadcast news. With the addition of subband autocorrelation change detection (SACD) pitch-class features [1], the classifier scored 27.58% FER and 15.56% SER. These results are substantially better than the best previously reported results on broadcast news tone classification [2] and are also better than a human listener achieved in categorizing test stimuli created by amplitudeand frequency-modulating complex tones to match the extracted F0 and amplitude parameters [3]. The same DNN architecture scored substantially worse when trained and tested with SACD pitch-class parameters alone: 39.22% FER and 24.89% SER. RAPT F0 estimates are worse yet: 44.37% FER and 27.28% SER. The 40 MFCC parameters do not encode F0 in any obvious way and attempts to predict SACD or other pitch features from them work badly. These surprising results raise difficult questions for theories of Chinese tone.
منابع مشابه
A Pitch Smoothing Method for Mandarin Tone Recognition
Mandarin Chinese is known as a tonal language with four lexical tones. Tone recognition plays an important role in automatic Chinese speech recognition in that the same syllable with different tones gives quite distinct meanings. The different tone can be characterized by its pitch contour, but the pitch contours are hardly ideal smooth curves. It is because the pitch points calculated by pitch...
متن کاملPitch Contour Model for Chi Using Cart and Statis
This paper describes an approach to generating prosody parameters for Mandarin Chinese text-to-speech system. The Chinese fundamental frequency contour is decomposed into two parts, a global intonation contour and a syllable level tone contour. The global intonation contour is converted to pitch target labels in corpus. It is predicted by first predicting pitch target labels using statistical m...
متن کاملHow Pitch Moves: Production of Cantonese Tones by Speakers with Different Tonal Experiences
This study investigates how native prosodic systems as well as L2 learning experience shape non-native tone production in terms of tone movement, a primary cue to tone identity. In an imitation task, the six Cantonese tones were produced by four speaker groups: native Mandarin speakers (tonal), native English speakers (non-tonal), native English speakers with Mandarin learning experience (L2 to...
متن کاملIncorporating Pitch Features for Tone Modeling in Automatic Recognition of Mandarin Chinese
Tone plays a fundamental role in Mandarin Chinese, as it plays a lexical role in determining the meanings of words in spoken Mandarin. For example, these two sentences R R (I like horses) and R M (I like to scold) differ only in the tone carried by the last syllable. Thus, the inclusion of tone-related information through analysis of pitch data should improve the performance of automatic speech...
متن کاملQuestion or tone 2? How language experience and linguistic function guide pitch processing
How does language experience shape pitch processing? Do speakers of tone languages, which use pitch to signal lexical contrasts (e.g., Mandarin Chinese) attend to pitch movements more closely than speakers of intonation languages (e.g., English)? Contradictory findings have been reported in the literature. In the current study, we hypothesize that listeners should be particularly attentive to a...
متن کامل